Modeling and Planning with Macro-Actions in Decentralized POMDPs
نویسندگان
چکیده
منابع مشابه
Planning with macro-actions in decentralized POMDPs
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may requ...
متن کاملPlanning with Macro-Actions in Decentralized POMDPs Citation
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macroactions: temporally extended actions which may requ...
متن کاملApproximate Planning in POMDPs with Macro-Actions
Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation...
متن کاملPlanning in Decentralized POMDPs with Predictive Policy Representations
We discuss the problem of policy representation in stochastic and partially observable systems, and address the case where the policy is a hidden parameter of the planning problem. We propose an adaptation of the Predictive State Representations (PSRs) to this problem by introducing tests (sequences of actions and observations) on policies. The new model, called the Predictive Policy Representa...
متن کاملEfficient Planning under Uncertainty with Macro-actions
Deciding how to act in partially observable environments remains an active area of research. Identifying good sequences of decisions is particularly challenging when good control performance requires planning multiple steps into the future in domains with many states. Towards addressing this challenge, we present an online, forward-search algorithm called the Posterior Belief Distribution (PBD)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2019
ISSN: 1076-9757
DOI: 10.1613/jair.1.11418